528 research outputs found
VGAN-Based Image Representation Learning for Privacy-Preserving Facial Expression Recognition
Reliable facial expression recognition plays a critical role in human-machine
interactions. However, most of the facial expression analysis methodologies
proposed to date pay little or no attention to the protection of a user's
privacy. In this paper, we propose a Privacy-Preserving Representation-Learning
Variational Generative Adversarial Network (PPRL-VGAN) to learn an image
representation that is explicitly disentangled from the identity information.
At the same time, this representation is discriminative from the standpoint of
facial expression recognition and generative as it allows expression-equivalent
face image synthesis. We evaluate the proposed model on two public datasets
under various threat scenarios. Quantitative and qualitative results
demonstrate that our approach strikes a balance between the preservation of
privacy and data utility. We further demonstrate that our model can be
effectively applied to other tasks such as expression morphing and image
completion
A fully-convolutional neural network for background subtraction of unseen videos
Background subtraction is a basic task in computer vision
and video processing often applied as a pre-processing step
for object tracking, people recognition, etc. Recently, a number of successful background-subtraction algorithms have
been proposed, however nearly all of the top-performing
ones are supervised. Crucially, their success relies upon
the availability of some annotated frames of the test video
during training. Consequently, their performance on completely “unseen” videos is undocumented in the literature.
In this work, we propose a new, supervised, backgroundsubtraction algorithm for unseen videos (BSUV-Net) based
on a fully-convolutional neural network. The input to our
network consists of the current frame and two background
frames captured at different time scales along with their semantic segmentation maps. In order to reduce the chance
of overfitting, we also introduce a new data-augmentation
technique which mitigates the impact of illumination difference between the background frames and the current frame.
On the CDNet-2014 dataset, BSUV-Net outperforms stateof-the-art algorithms evaluated on unseen videos in terms of
several metrics including F-measure, recall and precision.Accepted manuscrip
BSUV-Net: a fully-convolutional neural network for background subtraction of unseen videos
Background subtraction is a basic task in computer vision and video processing often applied as a pre-processing step for object tracking, people recognition, etc. Recently, a number of successful background-subtraction algorithms have been proposed, however nearly all of the top-performing ones are supervised. Crucially, their success relies upon the availability of some annotated frames of the test video during training. Consequently, their performance on completely “unseen” videos is undocumented in the literature. In this work, we propose a new, supervised, background subtraction algorithm for unseen videos (BSUV-Net) based on a fully-convolutional neural network. The input to our network consists of the current frame and two background frames captured at different time scales along with their semantic segmentation maps. In order to reduce the chance of overfitting, we also introduce a new data-augmentation technique which mitigates the impact of illumination difference between the background frames and the current frame. On the CDNet-2014 dataset, BSUV-Net outperforms stateof-the-art algorithms evaluated on unseen videos in terms of several metrics including F-measure, recall and precision.Accepted manuscrip
Spatio-Visual Fusion-Based Person Re-Identification for Overhead Fisheye Images
Person re-identification (PRID) has been thoroughly researched in typical
surveillance scenarios where various scenes are monitored by side-mounted,
rectilinear-lens cameras. To date, few methods have been proposed for fisheye
cameras mounted overhead and their performance is lacking. In order to close
this performance gap, we propose a multi-feature framework for fisheye PRID
where we combine deep-learning, color-based and location-based features by
means of novel feature fusion. We evaluate the performance of our framework for
various feature combinations on FRIDA, a public fisheye PRID dataset. The
results demonstrate that our multi-feature approach outperforms recent
appearance-based deep-learning methods by almost 18% points and location-based
methods by almost 3% points in matching accuracy. We also demonstrate the
potential application of the proposed PRID framework to people counting in
large, crowded indoor spaces
Action Recognition in Video by Covariance Matching of Silhouette Tunnels
Abstract—Action recognition is a challenging problem in video analytics due to event complexity, variations in imaging conditions, and intra- and inter-individual action-variability. Central to these challenges is the way one models actions in video, i.e., action representation. In this paper, an action is viewed as a temporal sequence of local shape-deformations of centroid-centered object silhouettes, i.e., the shape of the centroid-centered object silhouette tunnel. Each action is rep-resented by the empirical covariance matrix of a set of 13-dimensional normalized geometric feature vectors that capture the shape of the silhouette tunnel. The similarity of two actions is measured in terms of a Riemannian metric between their covariance matrices. The silhouette tunnel of a test video is broken into short overlapping segments and each segment is classified using a dictionary of labeled action covariance matrices and the nearest neighbor rule. On a database of 90 short video sequences this attains a correct classification rate of 97%, which is very close to the state-of-the-art, at almost 5-fold reduced computational cost. Majority-vote fusion of segment decisions achieves 100 % classification rate. Keywords-video analysis; action recognition; silhouette tun-nel; covariance matching; generalized eigenvalues; I
Increased transcription in hydroxyurea-treated root meristem cells of Vicia faba
Hydroxyurea (HU), an inhibitor of ribonucleotide reductase, prevents cells from progressing through S phase by depletion of deoxyribonucleoside triphosphates. Concurrently, disruption of DNA replication leads to double-strand DNA breaks. In root meristems of Vicia faba, HU triggers cell cycle arrest (preferentially in G1/S phase) and changes an overall metabolism by global activation of transcription both in the nucleoplasmic and nucleolar regions. High level of transcription is accompanied by an increase in the content of RNA polymerase II large subunit (POLR2A). Changes in transcription activation and POLR2A content correlate with posttranslational modifications of histones that play a role in opening up chromatin for transcription. Increase in the level of H4 Lys5 acetylation indicates that global activation of transcription following HU treatment depends on histone modifications
Behavior subtraction
Background subtraction has been a driving engine for many computer vision and video analytics tasks. Although its many variants exist, they all share the underlying assumption that photometric scene properties are either static or exhibit temporal stationarity. While this works in many applications, the model fails when one is interested in discovering changes in scene dynamics instead of changes in scene's photometric properties; the detection of unusual pedestrian or motor traffic patterns are but two examples. We propose a new model and computational framework that assume the dynamics of a scene, not its photometry, to be stationary, i.e., a dynamic background serves as the reference for the dynamics of an observed scene. Central to our approach is the concept of an event, which we define as short-term scene dynamics captured over a time window at a specific spatial location in the camera field of view. Unlike in our earlier work, we compute events by time-aggregating vector object descriptors that can combine multiple features, such as object size, direction of movement, speed, etc. We characterize events probabilistically, but use low-memory, low-complexity surrogates in a practical implementation. Using these surrogates amounts to behavior subtraction, a new algorithm for effective and efficient temporal anomaly detection and localization. Behavior subtraction is resilient to spurious background motion, such as due to camera jitter, and is content-blind, i.e., it works equally well on humans, cars, animals, and other objects in both uncluttered and highly cluttered scenes. Clearly, treating video as a collection of events rather than colored pixels opens new possibilities for video analytics.Accepted manuscrip
- …